Discovering Maximal Generalized Decision Rules Through Horixontal and Vertical Data Reduction
نویسندگان
چکیده
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples in the database. In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes. Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules. The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy.
منابع مشابه
Discovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms
The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...
متن کاملTowards Scalable Algorithms for Discovering Rough Set Reducts
Rough set theory allows one to find reducts from a decision table, which are minimal sets of attributes preserving the required quality of classification. In this article, we propose a number of algorithms for discovering all generalized reducts (preserving generalized decisions), all possible reducts (preserving upper approximations) and certain reducts (preserving lower approximations). The n...
متن کاملDecision Mining Revisited - Discovering Overlapping Rules
Decision mining enriches process models with rules underlying decisions in processes using historical process execution data. Choices between multiple activities are specified through rules defined over process data. Existing decision mining methods focus on discovering mutually-exclusive rules, which only allow one out of multiple activities to be performed. These methods assume that decision ...
متن کاملGalois Connection in Fuzzy Binary Relations, Applications for Discovering Association Rules and Decision Making
Galois connection in crisp binary relations has proved to be useful for several applications in computer science. Unfortunately, data is not always presented as a crisp binary relation but may be composed of fuzzy values, thus forming a fuzzy binary relation. This paper aims at defining the notion of fuzzy galois connection corresponding to a fuzzy binary relation in two steps: firstly by defin...
متن کاملA Probabilistic Rough Set Approach to Rule Discovery
Rough set theory is a relative new tool that deals with vagueness and uncertainty inherent in decision making. This paper introduce a new probabilistic approach for reducing dimensions and extracting rules of information systems using expert systems. The core of the approach is a soft hybrid induction system called the Generalized Distribution Table and Rough Set System (GDT-RS) for discovering...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Intelligence
دوره 17 شماره
صفحات -
تاریخ انتشار 2001